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Abstract 

We propose a simple abstract formalisation of the act of observation, 
in which the system and the observer are assumed to be in a pure state 
and their interaction deterministicaUy changes the states such that the 
outcome can be read from the state of the observer after the interaction. 
If the observer consistently realizes the outcome which maximizes the like- 
lihood ratio that the outcome pertains to the system under study (and 
not to his own state), he will be called Bayes-optimal. We calculate the 
probability if for each trial of the experiment the observer is in a new state 
picked randomly from his set of states, and the system under investigation 
is taken from an ensemble of identical pure states. For classical statistical 
mixtures, the relative frequency resulting from the maximum likelihood 
principle is an unbiased estimator of the components of the mixture. For 
repeated Bayes-optimal observation in case the state space is complex 
Hilbert space, the relative frequency converges to the Born rule. Hence, 
the principle of Bayes-optimal observation can be regarded as an under- 
lying mechanism for the Born rule. We show the outcome assignment 
of the Bayes-optimal observer is invariant under unitary transformations 
and contextual, but the probability that results from repeated applica- 
tion is non-contextual. The proposal gives a concise interpretation for the 
meaning of the occurrence of a single outcome in a quantum experiment 
as the unique outcome that, relative to the state of the system, is least 
dependent on the state of the observe at the instant of measurement. 



1 Introduction 

As early as 1935, Schrodinger wrote: "The rejection of realism has logical con- 
sequences. In general, a variable has no definite value before I measure it; then 
measuring it does not mean ascertaining the value that it has. But then what 
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does it mean?" |2H1- As the advent of quantum mechanics solved the long 
standing problem of providing an adequate description for several important 
and unexplained experiments, the problem of realism in quantum mechanics 
was initially perceived mainly as a challenge to the construction of a new phi- 
losophy of natural science. In support of this perception, is the fact that almost 
all later theoretical advances with experimental consequences came about with- 
out any serious progress with this very basic problem. Yet at the same time, 
a growing number of people recognized that progress in this problem would 
likely have deep consequences for the quantum-classical transition, the attempt 
to produce a successful unification of quantum mechanics and relativity theory, 
and the related problem of quantum cosmology. Halfway the sixties two im- 
portant advances were made. In 1964, John Bell showed that any local hidden 
variable theory will yield predictions that are at odds with quantum mechanics. 
A few years later, Kochen and Specker [23 presented an explicit set of mea- 
surements, for which the simultaneous attribution of values for each of these 
measurements, leads to a logical contradiction. The two results can be regarded 
as opposite faces of the same coin. Whereas Bell's result can be verified (or re- 
futed) by experiment, Kochen and Specker's argument shows the problem also 
to be a deeply-rooted theoretical one. These two results have been of such im- 
portance, that the notion of realism in quantum physics is usually considered 
automatically as having either the meaning of 'locally realistic' (Bell), or that 
of 'the impossibility of attributing predetermined outcome values to the set of 
observables' (Kochen and Specker). The apparent lack of realism in quantum 
mechanics has been illustrated again and again by clever theoretical construc- 
tions ranging from Bell-type arguments to impossible coloring games, and the 
countless attempts to produce an as loophole free as possible experimental ver- 
ification of these arguments ^. 

However, the commonly accepted notion that "measuring a variable does 
not mean ascertaining the value that it has", does not mean that the answer 
to Schrodinger's question is that the occurrence of a particular outcome has 
no meaning. Every proper quantum experiment is a testimony to the contrary, 
for if a single outcome has no informational content about the system at all, 
then how are we to derive anything at all from the sum of a great number of 
informationally empty statements? Whether we perform a tomographic state 
reconstruction, or experimentally estimate the value of a physical quantity of a 
system, we accept that in a well constructed experiment every outcome presents 
a piece of information, a piece of evidence, that brings us closer to the true state 
of affairs, whatever that may be. To give a more detailed answer to the question, 

^Because local theories, by Bell's theorem, cannot give rise to some of the experimentally 
verifiable predictions of quantum mechanics, the requirement of locality, or so-called "local- 
realism" takes a prominent role. However, realism seems more fundamental than locality, 
in the sense that the latter is only well-defined if we can attribute some form of reality with 
respect to the whereabouts of the system. Moreover, the derivation of the quantum correlation 
for most Bell-type experiments do not, at any point, invoke spatial coordinates. As far as 
concerns the actual application of quantum theory, it is quite immaterial whether we calculate 
the correlations between various outcomes that are obtained in a single location or at space-like 
separated locations. Of course, for a locally realistic theory, the difference is huge. 
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we are in need of a model that shows how a single outcome is obtained. We will 
provide such a model in an attempt to understand the meaning of the occurrence 
of a single outcome in a quantum mechanical experiment. More specifically, we 
will show that an observer actively seeking to minimize his own influence on the 
produced outcome, will, with the aid of Bayesian decision theory, give outcomes 
whose relative frequency converges to the Born rule in a natural way. This 
in turn will give us a possible interpretation for the occurrence of a particular 
outcome. 

2 Probabilities of outcomes for a single observ- 
able quantity 

Let us assume we have a system S for which we write S5 to denote its set 
of states, and A for an observable that can take any single outcome out of n 
distinct values in the outcome set X = {xi, . . . ,a;„}. At the most trivial level, 
there is a counting measure on the set of outcomes. If P{X) denotes the set of 
all subsets of X, then the probability that a measurement of observable A on 
the system in a state ip € yields an outcome in a given subset Xi G 'P{X), 
is a mapping 

p{.\.):r{x)xj:s^[o,i] (1) 

such that for disjoint Xi E 'P{X), we have: 

p{ux,\iJ) = Y,pix^m (2) 

i 

The additive property described by (|2Jl is generally accepted both in quantum 
and classical probability and provides the rationale for the use of normalized 
states, that is, states ip that satisfy: 

piX\iP) = 1 (3) 

In this way, (jJl reduces the number of free parameters in state space by one. 
We have written p{x\ip) to emphasize that it represents the probability that the 
outcome x obtains when (we know that) the system is prepared in the state 
ip. The classical interpretation for the arisal of probabilities, is one of a lack- 
of-knowledge about the precise state being measured. From a naive epistemic 
perspective, the outcome x is then an objective attribute of each measured 
state, and the probability related to each outcome is simply the fraction of 
states having the "x-attribute" in the ensemble of systems that we measure. 
As indicated in the introduction, such an interpretation for the probabilities 
in quantum mechanics is problematic. Even for a single spin 1/2 particle, one 
can show three measurements suffice to exclude such an interpretation, even 
without taking recourse to locality issues. 
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2.1 Quantum probability for a single observable quantity 



In orthodox quantum mechanics, the state space Eg is the complex Hilbert 
space H. The set of states of the observed system that we will consider, is the 
set of unit vectors in an n-dimensional Hilbert space Tin, 

E5 - e Hn : \^\ = 1} (4) 

As usual, the norm |.| is defined through the (sesquilinear) inner product that 
we will denote (.|.). Alternatively, one can take rays or even density operators 
for the states. Since both lead to essentially the same results, we will stick to 
unit norm vectors. Let C{TL^) be the set of linear operators that act on the 
elements of 7i„, then an observable A is represented by a self-adjoint element 
of/:(H„): 

A e C(HJ : A^= A (5) 

Throughout this presentation, we assume A has a discrete, finite, non-degenerate 
spectrum, which implies that eigenvectors belonging to different eigenvalues are 
orthogonal. Let F4 be the set of the eigenvectors^ of A 

Fa - {^p^ e Hn ■■ A\^P^) = c,\ij,), c, G M} (6) 

We now have {'^pi\^pj) = Sij and J2i — a-nd, because the spectrum 

is assumed non-degenerate, we have that F^, is a basis or a complete orthonormal 
frame. From linear algebra we know that an arbitrary element ^p" of 7i„ can be 
written in this frame F^, as: 

n 

IV'^) (7) 

i=l 

If satisfies ©, then it lies in E5 C Tin , and the a's obey: 

Y.a,a* = l (8) 

Moreover, one can easily verify that the observable A can be written as 

A=Y,a,\^,){i:,\ (9) 

i 

Hence the observable ^ is in a one-to-one correspondence with an orthonor- 
mal frame F^ of eigenvectors of A and we will represent the observable by its 
associated frame. Throughout this paper, we reserve superscripts of states as 
a mnemotechnical aid for system recognition (i.e. tp'^ is a system state and 
tjj"^ the state of the measurement apparatus) and subscripts of states to denote 
eigenstates. If a system is in an eigenstate corresponding to outcome Xi, we will 

^Again, wc neglect mathematical details with regard to phase issues and identify all ip with 
the same eigenvalue c; as the same eigenvector i/j^. 
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denote the corresponding eigenstate as ipi. For an arbitrary eigenstate Tpj we 
have 

p{xt\ipj) = Sij (10) 

Tlius for an eigenstate, and also for a statistical mixture of eigenstates, the 
classical interpretation of probability as "proportion of system having the x- 
attribute" is tenable. The more interesting case, however, is the probability for 
the occurrence of an outcome Xi when the system is in a general state ((Tj), which 
is given by the Born rule: 

p{x.\r) = m,r)\' = \a^\^ (n) 

The analog with the classical situation would be that represents a mixture 
of states that have attribute Xk in the right proportion such that the Born rule 
holds. However the Born rule holds even when the system is in a pure state, 
i.e. a state which cannot be obtained as a statistical mixture of states. We 
will show that it is possible to regard the probabilities as arising from a lack 
of knowledge about the detailed state of the observer if the observer actively 
attempts to choose the outcome that maximizes a specific likelihood ratio that 
we will present shortly. 

3 The process of observation 
3.1 The deterministic observer 

Let us first define what we mean by an observer. An observer is a physical 
system that takes a question as input, and yields in reply an outcome which 
is a member of a discrete set. This outcome can be freely copied, and hence 
communicated to many other observers. In general, this definition of observer 
will include the experimental setup, apparata, sensors, and the human operator. 
It is however quite irrelevant to our purposes whether we consider an apparatus 
or a detector, an animal or a human being as observer, as long as we agree that it 
is this system that has produced the outcome. We will furthermore assume the 
observer comes to this outcome through a physical, deterministic interaction. 
That is, if we have perfect knowledge of the initial state of the system and of the 
potentials that act on the system, we can in principle predict the future state of 
the system perfectly. Besides the fact that all fundamental theories of physics 
(even classical chaotic systems and quantum dynamics) postulate deterministic 
evolution laws, the requirement of determinism allows to derive probability as a 
secondary concept. So let us assume that the outcome of an observation is the 
result of a deterministic interaction: 

r : S5 X Em X (12) 

Here r is the interaction rule, E5 is the set of states of the observed system, 
S M the set of states of the observing system and X the set of outcomes that 
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observable A can have. We will deal only with a single observable, so no fur- 
ther notational reference is made to the particular observable. The mapping r 
encodes how an observer in a state ■0™ G Em, observing a system in the state 
tp^ e Eg, comes to the outcome x & X. Because our observer is deterministic, 
we assume t is single- valued. Probability will only arise as a lack of knowledge 
on deterministic events. The observer faces the task of selecting an outcome 
from the set X that tells something about the system under observation. But 
the outcome is always formulated by the observer, it has to be encoded somehow 
in the state of the observer after the observation. Hence the outcome itself is 
also an observable quantity of the post-measurement state of the observer. The 
outcome will then have to share its story among the two participating systems 
that gave rise to its existence: it will always have something to say about both 
the observer and the system under study. In |S] it was shown by a diagonal 
argument, that even in the most simple case of a perfect observer, observing 
only classical properties'^, there exist classical properties pertaining to himself 
that he cannot perfectly observe. More specifically, even if the observer can 
observe a given (classical) property perfectly, he cannot perfectly observe that 
he observes this classical property perfectly. There is no logical certainty with 
respect to faithfulness of a single shot, deterministic observation. On the other 
hand, observation is an absolutely indispensable part of doing science, hence 
it is only natural that every scientist believes that faithful observation can and 
does indeed occur. Living in the real world, somewhere between the extremes of 
the ideal and the impossible, we wonder whether there is a strategy for the ob- 
server so that he is guaranteed that each outcome he picks uses his observational 
powers to the best of his ability. 

3.2 Repeated measurement and the randomization of probe 
states of the observer 

Rather than attempting to measure observables in a single trial of an experi- 
ment, our observer turns to a new strategy. First he prepares an ensemble of 
a large number of identical system states. Next he will interact with each of 
the members of this ensemble in turn. For each and every single interaction, 
he will pick the outcome that somehow 'has the largest likelihood' of pertaining 
to the system. By randomizing his probe state and picking the outcomes in 
this way, the observer hopes to restore objectivity, so that he will eventually 
obtain information that pertains solely to the system under observation. To 
calculate within the deterministic setting of the previous section H12|l 

is in principle straightforward. The experiment our observer will perform is a 

^We say the property a of a system S in the state s is actual, iff the testing of property a 
for S in the state s, would yield an affirmation with certainty. A property is called classical 
when the outcome of the observation to test that property, was predetermined by the state of 
the sytem (whatever that state was) prior to the test. For a classical property we can define 
a negation in the lattice of properties that is simply the Boolean NOT. A property a is then 
classical for S iff for each state of S the property, or its negation, is actual. For details, see 

m 
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repeated one, in which the set of states of the system under study is reduced to 
a singleton, and the set of states for the observer is the whole of Ej\/. The set of 
states for the observer that leads to a given outcome x <E X when the observer 
observes a system in the state V''''* will be denoted as eig{x, ip^): 

eig{x, r) = {V- e SAf : T{r.^) = x} (13) 

From the single- valuedness of r in H12() . we have for Xi ^ xj: 

eig{x^, i^") n eig{xj,^p'') = (14) 

If we assume that the act of observation of an observable leads to an outcome 
for every state of the system investigated, we have 

Uf^^eig{x,,r)^^M (15) 

In this way r defines in a trivial way a partition of the state space of the observer 
with each member eig{xi, ■0*) in the partition belonging to exactly one outcome. 
We are now ready to introduce probability. With B{Y,m) a cr-algebra of Borel 
subsets of T,M^ (which we tacitly assume includes eig^XijTp'^) for every i), we 
define a probability measure fi that acts on the measure space (Ea/ ,y8(SM))- 
For any two disjoint (7^, (Tj in S(EAf ), we have 

fi : Bi^M) -> [0, 1] (16) 

fi{T.M) = 1 

In order to calculate p{x\Tp'^), we need to evaluate the probability measure 
over the set of states for the observer giving rise to the outcome x when they 
interact with a state ip'^: 

Pix\r) = ^l{eigix,r))/^^('J7=le^g{x^,r)) (17) 

= nieigix^ij')) (18) 

This last formula is fundamental to this paper. It says that for a repeated 
experiment on a set of identical pure system states, the probability p{x\'ip^) is 
given as the ratio of observer states that, given tell the outcome is x, to the 
total number of observer states. 

Note that the sets eig{xi,ip'') are not sets of eigenvectors in the algebraic 
sense of the word^. However, if it happens to be the case that, for a given -0* 
and for almost every S Ea/ , we have t('0^, ip) = Xk in the sense that 

f4e^9ixk,r)) = K^m) (19) 

*In accordance with the literature on the subject, we used Dirac's bra-ket notation for our 
brief introduction to quantum probabiUty. In what follows we will not make use of the duality 
between a Hilbert space and the space of linear functionals on this Hilbort space, so all vectors 
are written without brackets. 

5 The sets C3 

are called in eigensets in accordance with [Ij. 
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then, for that particular , we have p{xk\tp^) — 1. The vector tp^ thus defined, 
will coincide with a regular eigenvector if the state space is a Hilbert space. 

The relation between (|16|) and ^ is through the mapping r and the measure 
/X. It is obvious that Hlt)|) is additive in X too 

fi{eig{xi,ij')U eig{xj,ip'')) = fj.{eig{xi,ip'')) + fj,{eig{xj,tp'')) (20) 

because of (I14II . Hence, if the probabilities of H16|) and ^ coincide for every 
single outcome (the singletons in ViX) ), they will coincide for all of V{X). In 
what follows we will therefore restrict our discussion to the probability related 
to the occurrence of a single outcome. In conclusion, the success of the pro- 
gram to model the probabilities in quantum mechanics as coming from a lack 
of knowledge about the precise state of the observer stands or falls with the 
question of defining a natural mapping r (which determines the outcome and 
hence eig{x, tp'^) ) such that the measure fi of the eigenset eig{xi, ip") pertaining 
to outcome Xi is identical with the probability obtained by the Born rule 

3.3 The Bayes-optimal observer 

We can see from (|17f) that the system state can be associated with a proba- 
bility in a fairly trivial way: the probability of an outcome x when the system is 
in a pure state ip^ , is the proportion of observer states that attribute outcome x 
to that state. Even for a repeated measurement on a set of identical pure states, 
fluctuations in the outcomes can arise if there is a lack of knowledge concerning 
the precise state of the observer. Suppose now the observer, considered as a 
system in its own right, is in a state ■0™. Then in exactly the same way we can 
associate a probability with that state too. The operational meaning of this 
association is given either by a secondary observer observing an ensemble of ob- 
servers in the state V'™, or by the observer consistently (mis)identifying his own 
state ip™' for a state of the system ■0'*. We have argued that every outcome will 
say something about the observer, (that is, about 0™), and something about 
the system (that is, about ip^). The problem is that this information is mixed 
up in a single outcome. Some outcomes will contain more information about 
the state of the system, and some more about the state of the apparatus. Even- 
tually, we, as operators of our detection apparatus, will have to decide whether 
we will retain a given outcome, or reject it. Such decisions are a vital part 
of experimental science. For example, an outcome that is deemed too far off 
the limit (so-called outliers), is rejected and hence excluded in the subsequent 
analysis. The rationale for this exclusion is that an outlier does not contain 
information about the system we seek to investigate, but rather that it repre- 
sents a peculiarity of the measurement. In practice, rejection or acceptance of 
an outcome does not depend on a rational analysis, but on the common sense 
and expectations of the experimenter. Suppose however, that the observer does 
have absolute knowledge about the state of the system -0^ and his own state 
■0'", and recognizes the fact that the outcome he delivers may eventually be re- 
jected. The observer considers this rejection to be based on the following binary 
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hypotheses: 

Hq : the outcome Xi was mferred from ^/j** (21) 
Hi : the outcome Xi was inferred from ip"^ 

In full, the hypotheses should actually read: "The outcome Xi yields as a 
consequence of the observer attributing the state tp'^ (or tp"^) to the system". 
To combat rejection, the observer chooses the outcome that maximizes the like- 
lihood that Hq prevails, as if the outcome he delivers will eventually be judged 
for acceptance or rejection by one with absolute knowledge about "0* md ip"^. 
If, in an experiment, it is possible with (non- vanishing probability) to get an 
outcome Xi under either hypothesis, then a factual occurrence of this outcome 
in an experiment supports both hypotheses simultaneously. What really mat- 
ters in deciding between Hq and Hi on the basis of a single outcome, is not the 
probability of the correctness of each hypothesis itself, but rather whether one 
hypothesis has become more likely than the other as a result of getting outcome 
Xi. From Bayesian decision theory we have that all the information in 
the data that is relevant for deciding between Hq and Hi, is contained in the 
so-called likelihood ratios or, in the binary case, the odds Af. 

In this last formula, the numerator and denominator are given by H17(l . We are 
now in position to state our proposed strategy for the Bayes-optimal observer. 

Definition 1 (Bayes-optimal observer) We call a system M in a state V'™ 
a Bayes-optimal observer iff, after an interaction with a system in a state ip^ , 
the state of M will transform to a state that expresses the outcome Xi that 
corresponds to the maximal likelihood ratio Ki \2!3(l . 

Picking the outcome Xi from X that maximizes the corresponding likelihood 
ratio Ki, is simply optimizing the odds for Hq, given his information. This 
concludes our description of the observer. To see what probability arises for a 
repeated experiment when an observer is Bayes-optimal, we need a state space. 
We are especially interested in complex Hilbert space, but we will first have a 
look at statistical mixtures. 

3.4 The Bayes-optimal observer for statistical mixtures 

If the conditional probabilities p{xi\il}'^) axe well-defined (which we will just 
accept for now), we can make a summary of them in a single vector x(?/'*) : 

n 

x(^^) = ^p(a;,|0^)x, (23) 
1=1 

First we define the convex closure of a number of elements ai, . . . , a„ G A,: 

[ai, . . . , a„] = {a e R" : a = ^ \a„ < A, G R, ^ A, = 1} (24) 
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If we write [C], as we shortly will, we mean the convex closure of the elements 
in C. The standard (n — 1) simplex A„_i generated by the outcome set X is; 

] (25) 

We see from (I23|) . (|19|) that x(-0*) belongs to A„_i(X). By identification of 
the axes of M" with the members of X, we have A„_i(X) C M"(X), the free 
vector space generated by the outcome set X. Vectors like x('!/;*) are often called 
'statistical states' or 'mixtures' in the literature. Suppose now that all we can or 
care to know about the system S and the observer M, are the statistical states, 
i.e. the probabilities related to the outcomes of a single experiment. Within 
this constraint, the vector x('!/;*) represents all there is to know about S and the 
state spaces Ss and Sa/ reduce to A„_i(X) : 

Ss = Sm = A„_i(X) (26) 

Having identified '0'* with x(';/''*) in this particular case, the conditional prob- 
ability p(a;i |x(?/''*)) denotes the probability that outcome xi occurs when our 
knowledge about the system is encoded in the statistical state x(?/;'*) : 

x(V'') =p(xi|x(V/))a;i + ...+p(x„|x(V''))a;„ (27) 

In this section (, ) denotes the standard inner product in Euclidean space, and 
with {xi,Xj) — dij, we have from this last equation 

p{xMr))^{^{r),x,) (28) 

For a statistical state, the magnitude of the i*^ coordinate equals the probability 
of outcome Xi. We have a state space and we have a rule to extract a 
probability from a state (|28|l . so we can characterize the sets eig{xk,x{ip''))- Let 
and x('!/;™) be arbitrary states in A„_i(X), written as: 



x(0^) = J2^,x, (29) 



1=1 



By the definition of Bayes-optimal observation, we have that the outcome Xk is 
chosen, if for all j ^ k, the corresponding likelihood ratio's satisfy Afc > Aj. By 
l|22(l and H23|l. Xk is chosen, iff for all j = 1, . . . , n (j 7^ fc), we have: 

p(a:fc|x(V^")) ^ p{xj\yi{i}'')) 
p(xfe|x(V'")) p(a;j|x(V'")) 

The regions ezg(a;fc, x(?/''*)), are found by substitution of H29|l in (|28(l and then 
into With j = 1, . . . , n; j ^ k, we obtain: 

eigixk, = {x(^") e A„_i : ^ > ^} (31) 



10 



According to l(T7|l. the probability of the outcome x for the repeated experiment 
on a set of identical system states, is the ratio of observer states that tell the 
outcome is x, to the total number of observer states. Because the state space 
is Euclidean, it is natural to take for ^ the (n — 1)-Lebesgue measure in A„_i, 
assumed to be normalized: /x(A„_i(X)) ~ 1. The probabiHty pBo(2;fc|x(^/''')) 
that the Bayes-optimal observer obtains the outcome x^ is then given by 

PBo{xk\^{i!'')) = n{eig{xkMV))) (32) 

However, because of the way we defined the statistical state, the probability is 
also given directly by components of the state. So the question is whether the 
Bayes-optimal observer (|32|l can recover that probability, i.e. is it true that 
equals ((2H1): 

^i{eig{xkMV)))^{^{r),x^) (33) 

To see if this is the case, we first define the open convex closure of a number of 
elements xi , . . . , a;„ G M" as 

. . . , a;„h {x e R" : X = ^ A,x., < A, G M, ^ A, = 1} (34) 

We can now characterize eig{xk, for the statistical state as being 'almost 

equal' to 

C| =]xi, . . . ,Xfc_i,x(?A'*),Xfc+i, . . . ,x„[ (35) 

A graphical representation of the eigensets in the simplex state space can be 
found in Figure (1). 

Lemma 2 Let be defined as in 135}) . [C^] be the convex closure of Cf,, and 
ei5(xfc, x(?/;'*)) by then: 

C,!c eig{xkMr))^[Cl] 

The proof of this lemma can be found in appendix A. To obtain the prob- 
ability 132|l . we calculate the /i— measure of [C|], which is simply the {n — 1)- 
dimensional volume of the simplex [C|]. 

Lemma 3 If fi is a (probability) measure such that /i(A„_i(A')) = 1, and Cf, 
is defined by the convex closure of i'?,5|) . then we have /i([C|]) — tj. 

One can calculate of the volume of a simplex straightforwardly by deter- 
minant calculus, as was done in 2 . For completeness, we have included an 
alternative in the form of a simple geometric argument in appendix B. We then 
easily obtain: 

Theorem 4 /i(eig(xfc, V'*)) = ifc 

Proof. By the first lemma, we have Cf. C eig{xk,^{'4''')) C [C|]. Because 
A C B =^ fj,{a) < ii{B) we have 

MC|) </i(e*5(xfc,x(^^)))</i([C,^]) 
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Figure 1: Illustration of the scheme in the simplex state space. We start with 
the discrete outcome set, depicted in figure (a). The state space for an outcome 
set with three outcomes, is the standard 2-simplex in the free vector space 
generated by the outcome set over the field of real numbers, as depicted in 
picture (b). In figure (c), we see the eigensets C|, so we can see what outcome 
will be obtained from a Bayes-optimal measurement. An apparatus state picked 
from the darkest region, C|, will lead to the outcome X2, in the lightest region, 
Cf, to xi, and the intermediately shaded region, C|, leads to the outcome X3. 
The probabihty is the Lebesgue measure over the depicted eigensets. I.e., the 
probability of obtaining the outcome X2, is the normalized area of the darkest 
triangle. 
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Figure 2: A graphical exposition of the contextuahty of the outcome assignment, 
(a) If we pick an observer state from the open white triangle Ji/''', 2:2, a;3[, then 
a measurement of the state ips will yield outcome xi . (b) If we interchange the 
second and third component of , we obtain . The probability of obtaining 
outcome xi is the same as in picture (a), because the two triangles have the 
same area. However, an observer state choosen from the black shaded region 
would yield outcome xi in picture (a), whereas it would yield X2 in picture (b). 
Note that we did not change the Xi component in the state V'* to obtain ipl- 

By the second lemma we have /i([C^]) — t^. To calculate /i(C^), we note that 
nld) = ^l{[Cl])-n{[Cl]C^Cl). Because [C^]nC| is the collection of faces of C|, 
a set of finite cardinality whose members have an afhne dimension maximally 
equal to n — 2, it is /i— neghgible, hence we also have A^(C^) = ifc, establishing 
the result. ■ 

We see that indeed the Bayes-optimal observer recovers the probability that 
was encoded in the statistical state: 

PBo(a;fc|x(i/''*)) =tk= p(xfe|x(?/>*)) 

In this way the observer succeeds in obtaining a quantity that, in the limit 
of infinite measurements, depends only on the state of the system under investi- 
gation, and not on his own state. The results we have obtained for the simplex 
state space are identical to those in El, where the scheme was proposed under 
the name "hidden measurements" to indicate the origin of the lack of knowl- 
edge. In 12] the eigensets are postulated ad hoc, whereas we have derived their 
simplicial shape from the principle of Bayes-optimal observation. We will use 
this principle in the next section to extend the results of (2j to systems with a 
complex state space. 

Before we do so, two remarks are in order. First, we did not specify whether 
the state x^ip^) is the result of mixing 'pure' components with appropriate 
weights, as indicated by the components of the state, or whether it represents 
a statistical tendency, somewhat like a propensity, of an ensemble of identical 
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'pure' states to reveal itself in the different outcomes. That is, if all we are 
allowed to do is perform a single experiment on each member of the ensemble, 
then from the resulting statistics of a single observable, we cannot distinguish 
between these two situations. In other words, if we have an urn filled with coins 
and we are allowed to inspect the coin only after a single throw of the coin, for 
every coin in the urn, then we cannot know whether it is a tendency of the coin 
to show heads with probability 1/2, or whether half of the coins have both sides 
heads and half of them have both sides tails (or indeed a mixture of these two 
situations). Secondly, it is interesting that, even for the conceptually simple sta- 
tistical mixtures, the outcome assignment given by the Bayes-optimal observer 
is contextual in the following sense: given a state for the observer and system 
that lead to the outcome Xi, then the mere interchanging of the coefficients tj 
and tk (equal to the probability for the outcomes Xj and Xk) can easily result in 
a different outcome than xi , even if neither xj , nor Xk is equal to xi ! This can 
readily be verified in Figure However, the probability p(a;i|x('(/;*')) of the 
outcome is a function of ti only, hence the probability itself is non-contextual. 
Conversely, given a state of an observer V'™ and a system state ip'^ that inter- 
act to yield the outcome Xk, it is often possible to change the outcome of the 
Bayes-optimal observer to a different outcome by interchanging suitable coefR- 
cients of the observer, leaving rk untouched. This means that changing only the 
observer's preferences over the outcomes Xj and xi, may let the Bayes-optimal 
observer decide another outcome than Xk is more optimal, even if j, I and k 
are all different! This contextual aspect of the outcome assignment can here 
be understood as a result of the inescapable bias introduced by the state of 
the observer in producing a single outcome, for the coefficients of his state rep- 
resent his tendencies for each outcome^. Perhaps somewhat paradoxically, it 
is precisely through the averaging procedure over all the different possibilities 
for this bias, that a non-contextual probability emerges. From Figure Q we 
see that the contextuality of the outcome assignment depends on the classical 
entropy of the state. According to a well-known theorem due to Shannon, the 
higher the entropy of the state ip" , the closer the coefficients of V'* in l|23|l are 
to 1/n and the closer this state will reside near the centre of the simplex, effec- 
tively limiting the possibilities for producing a contextual outcome change by 
interchanging coefficients. 

3.5 Bayes-optimal observation in complex Hilbert space 

Complex Hilbert spaces are of considerable interest as they arise naturally in 
many prominent scientific areas including quantum theory, signal analysis (both 
in time- frequency and in wavelet analysis), electromagnetism and electronic 
networks^, and the more recently founded shape theory j^. The natural setting 

^This tendency could be revealed if we fix the state of the observer and observe (by means 
of a second observer)the relative frequencies for the outcomes he produces when he measures 
members of an ensemble of randomly choosen states. 

^Interestingly, the name probability amplitude, and indeed the Born interpretation of the 
wave vector in quantum mechanics, were conceived by Born in analogy with electromagnetic 
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for the discrete state space in these examples, is the space of square summable 
functions on a Hilbert space H„(C) over the field of complex numbers. A general 
state of the system tjj^ gT,s = W„(C) can then be written as: 

n 

r=Y.qiXi (36) 

i=l 

where qi G C and |'(/'^| = 1. In this case the outcome set X consists of an 
orthonormal frame of complex vectors {xi}. An observer (or a detector, which 
is quite the same for our purposes) usually has a very large number of internal 
degrees of freedom. Accordingly it lives in a Hilbert space of appropriately high 
dimensionality. However, by the Schmidt bi-orthogonal decomposition theorem, 
we know we can model every possible interaction between two systems, one living 
in a Hilbert space of dimension n and one in a Hilbert space of dimension m 
with m > n, by an interaction of two systems, each one living in a Hilbert space 
of dimension n. With this in mind, we model the set of states of the observer 
as unit vectors in H„: 

Em = {V e Hn{C) : |VI = 1} 

The reader should take note of the fact that, every time wc speak about 
"the state of the observer" , we mean the state in the subspace indicated by the 
Schmidt bi-orthogonal decomposition theorem. The state of the observer, to us, 
always means only that part of the state that is of relevance to the production of 
the outcome. This is especially relevant for the interpretation of sentences such 
as "uniform distribution of initial observer states", which taken too literally, 
would indicate the observer is perhaps doing something completely different 
than observing. The state of an observer with respect to an experiment with 
outcome set X can be written as (r, e C) 

n 

i;"' = J2riXi (37) 

Because the coefficients now assume complex values, they cannot be interpreted 
as probabilities because we do not have a total order relation in the field of 
complex nunibc;rs. This difference also affects the dciepcir. deterministic level of 
the description in a profound way. Let us explain why this is the case. For the 
statistical states of the former section, each eigenset is a subsimplex of the state 
space. A simplex is a (very) special case of a convex set. Because the eigensets 
share at most a lower dimensional face, any two different eigensets (for a fixed 
system state) can be separated^ by a single hyperplane. But in a complex 

waves. Here the norm is not unity, but equal to the energy in the wave, and probabiUty 
conservation is replaced by conservation of energy. 

Ci and C2 are two sets in M", then a hyperplane H is said to separate Ci and C2 iff 
Ci is contained in one of the closed halfspaces associated with H and C2 lies in the opposite 
closed half-space. Two convex sets in ]R" that share at most an aiBne set of dimension n — 1, 
can be separated by a hyperplane. 
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space a hyperplane does not separate that space in two half-spaces. To apply 
the criterion of B ayes- opt imality, one needs to decomplexify the space to restore 
the order relation, but this can be done in a variety of ways. On the other hand, 
this plurality of decomplexifications need not bother us too much. Just as in 
the case of the statistical states of the former section, the observer can check the 
statistical validity of his outcome assignment by verifying that the probability 
(in the sense of a relative frequency) that results from repeated application of his 
outcome assignment, equals the assumed probability. In the same way, we can 
simply postulate, or even guess, a specific form of the probability assignment 
and justify it a posteriori: If the relative frequency of an outcome (as a result 
of the observers' outcome assignment, based on the Bayes-optimal condition), 
converges to a limit that yields (a monotone function of) the very probability 
assignment he used to obtain those outcomes, the Bayes-optimal observer knows 
he was Bayes-optimal. Let us attempt a minimal generalization of the real case 
(Ell), with and defined as in ijSII) and j = 1, . . . , n; j ^ k: 

The only difference with ^'M^ , is that we take the modulus of the coefficients and 
that the set contains complex vectors, which is why we have given the eigenset 
the superscript C. To check the consistency of our Bayes-optimal observer in 
the complex state space, we evaluate the Lebesgue measure v{eig'^{xk,^''))- 
Therefore we regard the measure v in C" as the Lebesgue measure over M^". 
The calculation of the measure by direct integration can be avoided by use of a 
mapping u that preserves measures. A measurable mapping u between measure 
spaces (E, A, ^) and (S, B, v) is called a measure-preserving mapping if, for every 
B G B, we have ix{uj~^{B)) — u{B). In appendix C we demonstrate that the 
component-wise (or Haddamard) product of a complex vector with its complex 
conjugate, that sends elements of the complex unit-sphere S'„ = {2 G C" : 
X]r=i ^i-^i = 1} O'^to ~ 1) -simplex A„_i = {a; G R" : — 1} is 

indeed measure preserving in this sense. We have given a graphic representation 
of the action of lo in Figure Q. We are now in a position to prove our main 
result. 

Theorem 5 

p{xk\r) = \{xk,r)? 

Proof. With eig'^{xk,ij^) defined by (jSHl, and 

Cfe =]'^{xi), ■ ■ ■ ,Uj{xk-l),Uj{lp''),(^{xk+l), ■ ■ .,Uj{Xn)l 

it is straightforward to show that (for more details, see 0) we have: 

C'|Cc.(ez/(xfe,V;^))c[Ca 

Let jl and z> stand for the normalized versions of the measures ^ and ly in 
the proof in appendix C, so that their constant of proportionality equals one: 
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Figure 3: The action of the mapping oj sends elements of the unit sphere to the 
standard simplex (upper figure) . The probability for the occurrence of outcome 
Xk is the measure of the eigenset corresponding to outcome Xk and is calculated 
in the simplex using the measure preserving mapping lo. The eigensets are 
depicted in the lower figure for the simplex; it is not possible to show graphically 
what these sets look like in the complex unit sphere. 
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i>{u} ^{A)) = jl{A). By definition p{xk\'(lj^) = £'(eig'''(xfc, ■0")), and by the previ- 
ous lemma, we have 

z>(ez/(xfc,V^)) = H^-\Cl)) 

The normalized measure jj,{C^) of the real simplex C| was calculated in the real 
state space. A completely equivalent calculation gives us 

/2(C^) = {ijixk),iu{r))^\qk\^ 

= \{xk,r)\' 



We see that indeed the Bayes-optimal observer recovers the Born rule as a 
result of his attempt to maximize the odds with respect to the outcome that 
pertains to the system. To be precise, we did not maximize the odds, because 
substitution of the Born rule for the probability in H22|) gives: 

Whereas our observer, by calculated the ratio's: 

where the tilde denotes the fact that, strictly speaking, this is not a likeli- 
hood, because \qk\ and jr^l aren't probabilities (they are square roots of prob- 
abilities). Yet, it is obvious that the value of k for which H39|) and (|40|l are 
maximal, is the same because one is the square of the other, which is clearly a 
monotone function. As a consequence, it does not matter if the Bayes-optimal 
observer works with (|39|l or with H40|l : repeated application of either strategy 
on the same pure state will make the relative frequency converge to the Born 
rule in exactly the same way in both cases. 



4 Consequences of Bayes-optimal observation 
4.1 Decision invariance and unitarity 

The outcome chosen by a Bayes-optimal observers, is the one that maximizes 
the corresponding likelihood ratio A^. Any monotonously increasing function of 
the likelihood ratio's preserves their relative order, and hence their maximum. 
By (|31f) and (|38|l . this carries over to the coefficients of the state vectors in both 
the real and the complex state space. The same is true for multiplication by 
a phase factor, which is cancelled by taking the moduli in H38|) . As a result, 
the state space is not only a vector space, it is a projective vector space: if 
the vectors in the state space are multiplied 2; G C, < |z| < 00, this does 
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not change the result of the decision procedure adopted by the Bayes-optimal 
observer. There is another interesting class of transformations that leaves the 
Bayes-optimal decision unaltered. For any ifj^, the probability of Xk is defined 
as: 

pixkli'") = iJ.{eig^{xk,V)) 
Because uj{eig'^{xk,ip'')) C [C^], oj continuous, and because the elements of [C^] 
have finite norm, the norm of the vectors in eig^{xk,4'^) is finite too. We can 
then apply a linear transformation to the base vectors of the state space: 

T : Ss^Ss (41) 

n 
i 

The eigenset eig^(a;fc, ■(/''') will accordingly be transformed by applying T to 
Xk and tp'^ . By Lebesgue measure theory, the volume of the transformed set is 
proportional to the volume of the original set, the constant of proportionality 
being the determinant of the transformation: 

^i{T{e^g^{xk,r))) ^ \detiT)\fi{eig^{xk,r)) 

for all eig'^ixk,^'') G This is a classic result^, and we refer the interested 

reader to ((IHIi p54) for a proof. Note that this would typically be untrue for 
a nonlinear transformation. As a result, all transformations with | det(T)| = 1 
leave the probabilities invariant, which means we have invariance under unitary 
transformations. Intuitively this is obvious: if the probabilities have their origin 
in a measure on state space, then scaling, phase shifting, forming the mirror 
image, or 'rotating' the entire state space, does not alter the relative proportions 
of the eigensets, hence the invariance. Of course, it is easy to derive from the 
Born rule that the probabilities are invariant under unitary transformations, 
because the Born rule is the square modulus of an inner product and a unitary 
transformation can be defined as a linear operator that leaves the inner product 
invariant. Our invariance principle tells us the same story at a deeper level, for 
not only the probabilities are invariant under unitary transformation, but also 
each obtained outcome will be the same whether or not we unitarily transform 
the eigensets. 

4.2 The elusive quantum to classical transition 

Suppose we have a particular statistical mixture 

VP = + (1 - 0^2 (42) 
of two (pure) states ipi and ■02 with ^ g]0, 1[. Suppose furthermore that 

P{xi\i)i) = qi 

P{Xi\lp2) = 92 

^As before, we regard the complex n— space as a real 2n— space, for which the theorem is 
applicable. 
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Then an observing system is said to satisfy the linear mixture property iff 

p{x,\^) ^ ^qi + {1 - 012 (43) 

In words: the probability of a mixture equals the mixture of the probabilities. 
Does the Bayes-optimal observer satisfy the linear mixture property? Well, cp is 
a statistical mixture, as defined in the section on Bayes-optimal observation of 
statistical mixtures, and each of the constituents in the mixture is a pure state, 
as defined in the section Bayes-optimal observation in Hilbert space. So clearly, 
our Bayes-optimal observer satisfies the linear mixture property. In essence, this 
stems from his initial states being uniformly random (almost everywhere). In- 
deed, suppose the distribution of the initial states of the observer is not uniform 
a.e.. Then one can always find a convex region S in state space with surface 
measure A, for which the density of observer states is not equal to 1/A. Without 
giving a formal proof, one can see that, it is always possible to find two states 
'01: ip2 ^ S and a real number ^ e]0, 1[, such that ^Vi + (1 ^ 0'^2 G S and for 
which the linear mixture property will be violated. 

The linear mixture property is essential to experimental observation: no 
experimenter would put his faith in the hands of a detection apparatus that 
manifestly fails this most basic requirement. From this perspective, the diffi- 
culty of finding an intermediate region between the classical and the quantum, 
originates from the lack of a principle that determines how the observer should 
behave in order to objectively observe the intermediate region in absence of the 
linear mixture property. As an example, suppose we want to determine the 
length of a linearly extend system. In a classical setting, we are in principle free 
to choose the number of outcomes, and we are allowed to make many observa- 
tions before we settle on the result of a single measurement. For example, we 
can align the zero of the measuring rod with one end point of the system and 
read the outcome at the other end point as many times as we want to. If we 
are not satisfied with the precision that the measuring rod affords, we can pick 
a better one, or improve it by adding a nonius (or vernier) system to it. As 
long as we are able to do this, we are still in a classical regime of observation. 
In the classical regime of observation, the distribution of observer states will be 
highly non-uniform. Ideally, of all possible measurements, the only uncertainty 
we have about the state of the observer that is assumed to be of relevance to the 
measurement outcome, is an uncertainty of the order of the smallest number the 
measuring rod can represent. To decrease the uncertainty about the result, even 
beyond the precision offered by the smallest number the rod can represent, it is 
common scientific practice to perform the measurement many times. Assuming 
identical, independent observations, one can apply standard error theory. In 
the beginning of the eighties, Wootters has shown (HH, |SS1), using standard 
error theory, that the distance (angle) between two states on the unit sphere in 
(real) Hilbert space, is proportional to the number of maximally discriminating 
observations along the geodesic between those two points. This beautiful result 
gains in richness when considered from the point of view that the probabilities 
arise in a Bayes-optimal way. In our search for ever more precise measurements 



20 



or measurements on ever smaller constituents of nature, we eventually reach a 
region where we cannot repeat measurements without absorbing the system or 
altering its state. We may not even be able to choose freely the set of outcomes 
for a particular measurement, as is the case in the quantum regime. It is then 
no longer possible to directly obtain the "true" value of a physical quantity, be- 
cause the eigenstate of the observing system may not (and in general will not) 
coincide with the state of the system under investigation. We cannot attempt 
the same measurement (or one with altered eigenstates) on the same system, 
because the state of the system has been altered, or even destroyed. In view of 
this impossibility, we are led to statistical observation on ensembles. We have 
shown it is possible to recover an objective probability if the distribution of 
observer states is uniform. We see that the best possible observation scheme in 
the classical regime entails a minimal uncertainty (i.e. about the interpretation 
of the last digit only) in the state of the observer, and in the quantum regime 
a maximal uncertainty (any outcome is in principle possible) about the state of 
the observer. The consequence of such an interpretation is, that we will only be 
able to identify intermediate regions when we allow for a more complete descrip- 
tion of the observing system. In essence, we need to describe how to go from 
this minimal to this maximal uncertainty state. There are good reasons for cau- 
tiously entering this intermediate region. Some of the beautiful properties of the 
classical and the quantum regime will not hold. For example, the linear mixture 
property cannot be universally satisfied. Moreover, we will obtain probability 
distributions that depend not only on the system, but at least partially on the 
dynamics of the observing system. It is possible to construct explicit models 
that show 0] one can identify an intermediate region where the probabilities 
satisfy neither the classical statistical Bonferroni inequalities^*^ indicating the 
absence of a straightforward Kolmogorovian model, nor the Accardi-FeduUo in- 
equalities that constrain the set of probabilities that are derivable from a 
Hilbert space model. This opens up a whole new area of investigation, but only 
if we are willing to take the bold step of abandoning the full generality of the 
linear mixture property. 

4.3 Is the Bayes-optimal observer objective? 

The purpose of objective observation is to obtain a probability for the outcome 
that depends only on the system under study. How fast the sequence of outcomes 
converges to this probability, depends on how well the observer manages to 
distinguish his state from the state of the system under study. This aspect was 
neglected in the previous discussion. If we apply the Born rule to calculate the 
quantities p{xi\Ho) and p{xi\Hi), we imply that Y,iP{^i\Ho) = Y,iPi^i\Hi) = 
1. However, if the choice between Hq and Hi is indeed a binary decision problem, 

^"The Bonferroni inqualities indicate when a set of (joint) probabilities can be derived from 
a Kolmogorovian probability model. The best known example of a Bonferroni inequality in 
the foundations of quantum mechanics, is the Bell inequality. 
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we should have: 



^p(x,|i/o) - a (44) 

i 

'^p{xi\Hi) = 1-a 

i 

The reason why this is not contradictory, is because the observer chooses 
his outcome, as if the outcome will be judged afterwards as a binary decision 
problem. The observer himself has a priori no clue what the value of a might 
be. But even if he would estimate the value of a after repeated measurements, 
then still this knowledge cannot not help him to give a more optimal outcome. 
To the Bayes-optimal observer, knowledge of a would merely have the effect 
of scaling the odds in (|22|l by The choice of the outcome for the Bayes- 

optimal observer is based on the maximal likelihood and a monotone function 
of the likelihoods will not change the maximum. Thus we see that the specific 
value of a has no influence on the actual choice. If /i is truly uniform, then, 
the resulting relative frequency will converge to the Bayes-optimal probability 
that only depends on the state of the system, whatever value a happens to 
have in practice. However, a small value of a implies that for each outcome, the 
probability that the outcome depends on the state of the system, is small. So the 
expected increase in information about the system as a result of obtaining that 
outcome, is small too. Evidently this will extend the number of measurements 
needed to acquire information about the system. We see that a is a crude 
statistical measure for the objectivity of the observer. It represents his ability 
to separate interior from exterior. It turns out we can always pick an outcome 
that supports Hq more than it supports iff a > 1/2. To see this, we proceed 
ad absurdum. If no outcome supports Hq more than it supports Hi, then for 
all Xj , 

p{x,\Hi) 

But then^^ we have: 



< 1 



< 1, (45) 



which implies a < 1 — a. We obtain the contradiction iff a > 1/2. In words: if 
we can do only slightly better than completely arbitrary in letting the outcome 
probability depend on the system, we can guarantee the existence of an outcome 
that maximizes the odds and is greater than unity. In fact, for any value of a 
we can find an (almost always unique) outcome that maximizes the odds, but 

^^This specific condition is known in the literature as majorization. It plays an important 
role in the investigation of bipartite state conversions by local operations and classical commu- 
nications (LOCC). This may seem relevant in connection to our problem, as the basic scheme 
wc present can be described as a bipartite state conversion problem. However, we cannot use 
the many interesting results in the literature on bipartite state conversion because LOCC's in 
this particular problem are operationally defined by means of local unitary transformations 
and a local measurement, and it is the local measurement that we seek to understand! 
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when a > 1/2, the maximal hkehhood ratio enjoys the property of being greater 
than one. 

4.4 The Bayes-optimal observer as a paradigm for obser- 
vation 

The proposed principle of observation is based on a Bayesian treatment of a 
binary decision problem, but is not used in its usual decision-theoretic form. In 
decision theory we seek to establish which of the hypotheses enjoys the strongest 
support in evidence of the data. In our case, there is no data to feed the 
likelihood with, because we produce the data by means of the odds. The way 
we employ the principle is like an inverse decision problem, as if anticipating 
that the result will be judged afterwards by a decision procedure performed 
by one with absolute knowledge of the system and observer states prior to the 
measurement. The possibility of applying Bayesian decision theory in quantum 
mechanics came to me through the realization that the criterion established by 
Aerts D. at the end of [2| to characterize the so-called hidden measurements, is 
a monotone function of the Bayesian odds and hence leads to the same choice 
for the outcome. In this sense, this paper can be seen as providing a Bayesian 
foundation for the structure of the hidden measurements as given in, for example 
12 and 13, and extending the results to the complex Hilbert space. 

More recently it has come to my attention that a somewhat similar paradigm 
(without reference to quantum mechanics) is proposed in several papers that 
deal with visual perception by humans. The idea that the visual system is 
rooted in inference, can be traced back to the work of Hclmholtz [20], who 
proposed the notion of unconscious inference. It was only in the last decade 
that it was accepted and translated into a mathematical framework, not in the 
least because computer scientists who want to model the human vision system 
are faced with the apparent complexity that underlies human perception. The 
Bayesian framework provides the tools necessary to understand and explain a 
wide variety of sometimes baffling visual illusions that occur in human percep- 
tion In retrospect, we have borrowed the term 'Bayes-optimal' from this 
literature, because the term so neatly describes the principle and it did not seem 
appropriate to introduce a new term. There are however some differences in the 
application of the principle with respect to our proposal. In the literature on 
visual perception, the prior distributions are derived from real world statistics. 
Of course, this begs the question how these prior distributions were obtained 
in the first place. There are two basic possibilities to obtain a prior: either a 
prior distribution is based on some theoretical assumption, or it is established 
by looking at the relative frequency of actual recordings. The first option is 
the one we pursued in this article, where we assumed a uniform distribution of 
observer states^^. In the second case, which is the one adopted in the literature 
on perception, one has the advantage of being able to explain a wide variety of 

^^The absence of a more informative prior distribution effectively reduces the criterion of 
Bayes-optimality to a Neyman-Pearson maximum-likelihood criterion. 
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visual effects in human perception, and how the priors can be adapted through 
the use of Bayesian updating, but we cannot explain observation itself. The 
relative frequency needed to obtain the prior, is rooted in the observation of 
data, which requires another prior and so on ad infinitum. One can break from 
this loop by reconsideration of what a state is. In the literature on perception 
states are considered only as (real) statistical mixtures, severely limiting both 
the applicability and the philosophical scope of the paradigm. The state, as we 
have defined it here, can be a complex vector, not obtainable as a mixture in 
principle, and yet give rise to probabilities if we attempt to observe it as good 
as possible. So the state is simultaneously a description of the 'mode of being' 
(the pure state that physically interacts), and a 'catalogue of information' (the 
probabilities the Bayes-optimal observer obtains). 

The possibility that the same principle governs human perception and quan- 
tum mechanical observation, strengthens the Bayes-optimal paradigm. Mea- 
surement apparata and human perception can be rooted in the same principle: 
the attempt to relate the outcome to the object under investigation as unam- 
biguously as possible by choosing the outcome that has the largest odds (f^ . 
By repeating the observation many times, each time randomizing the internal 
state of the sensor, we obtain an invariant of the observation that pertains solely 
to the system. 

Another interesting link with the existing literature was pointed out to me 
by Thomas Durt jJS]. The regions of the Bohm-Bub model jJT] coincide with 
our definition of the eigensets in the complex case (|38|l . Moreover, Bohm and 
Bub propose a uniform measure of states that they interpret as apparatus states. 
They perform the integration directly for the two dimensional case, and indicate 
the integration scheme can be extended to the more dimensional case. Their 
result, like ours, is the reproduction of the Born rule. From the perspective of 
this paper, Bayes-optimal observation yields an interpretation for the regions 
employed by Bohm and Bub. 

5 Concluding remarks 

The search for a Bayesian or decision-theoretic framework for quantum probabil- 
ity has recently been subject of a number of interesting pubhcations (^3, [TB] . 
[TH). (221 , HH], and IS]). One important motivation for seeking such 

an interpretation, is that it allows for a subjective interpretation of quantum 
probability by regarding the state vector as a mathematical representation of 
the knowledge an agent has about a system. An often heard critique of Bayesian 
interpretations of quantum probability is that, from a strictly Bayesian point of 
view, the state vector represents the knowledge available to the agent that deals 
with it. A majority of physicists rejects this notion, mainly because they feel 
the relative frequencies obtained in actual experiments are objective features of 
the system, and not of the knowledge of the agent. The Bayesian pragmatic 
response to this, is that what can be inferred about a system always depends on 
one's prior knowledge of the system. However, in a theory that takes observation 
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as a primitive concept, one cannot assume to have a priori knowledge. This is 
what we have modelled here as the uniform distribution of initial observer states 
and Bayes-optimal observation of an ensemble of identical states will then result 
in an unbiased probability. If it is physically possible to obtain unbiased esti- 
mates for a sufBcient number of observables so that we can reconstruct the state 
vector, then, at least in an operational sense, the state can be truly assigned 
in an objective way to a system. Besides the objective informational content of 
the state, the state may also represent an objective reality. This is in agreement 
with the fact that we started from assumption (|12|l : that the state is a realistic 
description of the system, and it is the state of the system and the observer that 
physically and deterministically interact to produce the measurement outcome. 
Systems are in a state, and that state uniquely determines every possible inter- 
action. The state vector truly represents complete information about a system, 
but not merely as a collection of objective attributes, but as a representation 
of the possible deterministic interactions with any other system, in particular 
observing systems. A classically objective attribute, from this perspective, is 
then the limiting case where the same result follows for the vast majority of 
states of Bayes-optimal observing systems that the system can interact with. 

The proposed interpretation is falsifiable in principle but there are obstacles 
along the way. If we succeed in preparing the relevant degrees of freedom of 
the states of the apparatus, we could produce a non-uniform distribution for the 
initial states. Such a prepared apparatus would be able to distinguish some pairs 
of states better, and some pairs of states worse than the usual Born rule allows, 
which means it can only be used to our advantage if we posses some information 
about the state prior to the measurement. It also means that the probability 
for the occurrence of an outcome when we measure a mixture of states, depends 
nonlinearly on the probabilities for each component of the mixture; a failure of 
what we have called the "linear mixture property" . This would most likely lead 
to a rejection of the validity of the apparatus by the majority of experimentalists. 
And, we hope to have shown, in complete absence of prior information, it is not 
evidently desirable to deviate from a complete lack of knowledge of the apparatus 
state. 

Perhaps there is another, still deeper, reason why it is not possible to com- 
pletely control the state of the observer at the quantum level. The source of 
probability in observation, the randomness in the state of the observer, may 
very well at some point become fundamentally incontroUable. Logical argu- 
ments seem to defend at least the possibility of such a thesis. In ^2], it is 
shown by an elegant construction, that for every observer there will be differ- 
ent states of himself that he cannot distinguish. In it is shown that, on 
purely logical grounds, no observer can determine whether his observations are 
entirely faithful^^. It seems that, for every single measurement outcome, there 
is a trade-off between the information an observer can choose to extract about 
himself, and about the system he is observing. This trade-off can be quantified. 

^^To the best of our knowledge, the relation to such logical arguments and the quantum 
measurement problem was pointed out for the first time in 1977 in a remarkable pioneering 
paper by Dalla Chiara in 1141 . 
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It is argued in PHI and |S] on thermodynamical grounds, that any gain in infor- 
mation about a system is accompanied by an equal increase of entropy about 
the state of the observing system. If this is indeed the underlying structure for 
the occurrence of the quantum probabilistic structure, then the probabilities in 
quantum mechanics are indeed ontic and epistemic at the same time. From an 
absolute perspective, probability always arises because there is a lack of knowl- 
edge situation; it is a measure over deterministic events. But to the one who 
observes, this lack of knowledge may be fundamentally irreducable. It might 
turn out that, after all, Einstein and Bohr were both right about the origin of 
probabilities in quantum mechanics. 
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6 Appendices 
6.1 Appendix A 

Lemma 6 If ^ is a (probability) measure such that fi{A„-i{X)) = 1, and Cf, 
is defined by the convex closure of 1^3^) . then we have /i([C^]) = t]~ 



Let pn-i be the (not necessarily normalized) (n — 1) -Lebesgue measure in 
A„_i(X). Then we have 



In this last equation, B = [xi, . . . , Xk-i,Xk+i, ■ ■ ■ , Xn] is the face shared by the 
two simplices, and c?(-B, a) the smallest Euclidean distance between point a and 
each point of face B, which is proportional to the norm of the orthogonal pro- 
jection of a onto a unit vector h perpendicular to B. In M" no unique vector is 
perpendicular to B (which only has affine dimension n — 2), but as long as we 



Proof. 




Pn-l{[xi,. ■ . ,Xfc-l,x(-0^),a:fc+l, . . ..Xn]) 
Pn-l{[xi, ■ ■ ■ ,Xn]) 



p.^^2{B)d{BMr)) 
Pn-2{B) d{B,Xk) 
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stick to the same vector b for both simpUces, the same constant of proportion- 
ahty will apply, and the ratio will eliminate that constant. Pick the Xk base 
vector as b, which is obviously unit-norm and perpendicular to B. The orthog- 
onal projection of the top of C| to b is: x(^^) i b — {x{'il;^),Xk)xk = tkXk ■ For 
A„_i, the top is the vector Xk itself and its projection Xk i b — (xk,Xk)xk = Xk- 
Hence we have 

d{B,x{r)) ^ \\{K{r)ib)\\ 

d{B,Xk) \\{xkib)\\ 

= tkXk/Xk = tk 

■ 

6.2 Appendix B 

Lemma 7 Let be defined as in iy5}) and eig{xk,x{ip^)) by then: 

Cld eig{xkMr))c[Cl\ 

Proof. We start with the first inclusion. Suppose is in one of the 

open (n — 1)— simplices C|, then, by definition, there exist Ai such that, with 
< A, < 1,SA, = 1, 

n 

On the other hand we have that x(?/;'*) <E A„_i and hence there exist ti > 
0,^t, = 1 such that ^ holds: 

n 

1=1 

Substitution of 1071) into (gHJ yields 

n 

x('(/''") = XktkXk + ^{>^i + >^ktt)x^ 

i^k 

Calculating the likehhood ratios we obtain Ak ~ and for i 7^ fc we 

have: 



\ + A/c^i 

We easily see that Ak > Ai iff Ai > 0, which is satisfied by assumption. Hence, 
by every x(?/'™) G C| gives an outcome Xk, establishing the result. For 
the second inclusion, suppose there exists some x{ip™') e A„_i with ^ 
[CI]. The sets C| in our theorem, as can be seen from the definition (|35|l . are 
disjoint open [n — l)-simplices. If we had defined them by means of the closed 
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convex closure, they would maximally share the (n — 2) simplex A^_2(ji ^) = 
[x(V'''),xi, . . .,Xj-i,Xj+i, . . . ,Xk~i,Xk+i, ■ . . : 

[q]n[c,^]-A?,_2(j, fc) 

Assume first a is not in the boundary of [C|], i.e. not in one of the lower 
dimensional sub-simpliccs A^_2(j; fc)- Then S Cf with i ^ k. Because of 

the above demonstrated first inclusion we have x(?/'™) G e'igx{ip=)ixi) and hence 
x(V'™) ^ eig{xk, If on the other hand x('0'") S A^_2(j, fc), our outcome 

assignment on the basis of the maximum likelihood principle is ambiguous, as 
there will be two equal maxima, and even more when x(V'™) is chosen in a 
still lower dimensional subsimplex. However, we are free to choose whatever 
outcome we like as long as it is one of the maxima. Because the maxima 
coincide, these points lie in the boundary and hence the conclusion remains 
ei5(xfc,x(V'')) C [C^]. ■ 

6.3 Appendix C 

Lemma 8 The mapping uj 

LU : S'n — > A„_i 

is measure-preserving, i.e. for two measure spaces (A„_i, S(A„_i), /i) and 
{Sn,B{Sn),i^) and A e S(A„_i) and uj^'^{A) e B{Sn), we have: 

Proof. ^''Let A be an arbitrary open convex set in Ai : A = {(x\,x-2) ■ a < 
xi < b, X2 ~ I — xi}. Evidently, — \/2(6 — a). Let B be the pull-back of 
A under lu : 

B = {(zi,Z2) e ^1 X ^2 C : Zi = {zi : a < < 6}, 
Z2 = {Z2 ■■ Z2 - Vl- klpe*^0 e [0, 2^[}} 



Clearly, 



v{B) = v{Zx)v{Z2) = TT{b - a).2TT = —^niA) 



V2' 

Hence the theorem holds for convex sets in A2 . This conclusion can readily be 
extended to an arbitrary (n — l)-dimensional rectangle set A in A„_i : 



n-1 



A = {{xi, . . . , a;„_i, 1 - ^ Xi) : Vi = 1, . . . ,n - 1 : flj < < &i; fli, 6i G [0, 1]} 



i=l 



^■^This proof was first presented in 0, but we include it for completeness. The author 
grateful for a valuable hint from Wade Ramey that was helpful in proving the theorem. 



30 



Its measure factorizes into: 



n{A) = V« JJ(^j - ai) 



i=l 



Next consider n-tuples of complex numbers: 



B 



{(Zl, Z2, . . . , Z„) e Zi X . . . X Zn} 

{zj e C : flj < <bi,i^ n}, 



v/l-kiP-.-.-kn-iPe*''" , 0„ G [0, 2^[}} 



Clearly llj{B) — A. The measure of B can be factorized as: 



v{Zi)v{Z2) . . . v{Zn) 

27r" 




Hence the theorem holds for an arbitrary rectangle set A C A„_i. But every 
open set in A„_i can be written as a pair-wise disjoint countable union of 
rectangular sets. It follows that v{uj^^{-)) ~ ^^/i(-) for all open sets in A„_i. 
Both V and are finite Borel measures because A„_i and Sn are both compact 
subsets of a vector space of countable dimension. Therefore they must be regular 
measures ([S]: p47), which are completely defined by their behavior on open 
sets. Hence lo is measure preserving for Borel sets. ■ 
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